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We investigate the crystallization of a single, flexible homopolymer chain using transition path sampling 
(TPS). The chain consists of N identical spherical monomers evolved according to Langevin dynamics. While 
neighboring monomers are coupled via harmonic springs, the non-neighboring monomers interact via a hard 
core and a short-ranged attractive potential. For a sufhciently small interaction range A, the system undergoes 
a first-order freezing transition from an expanded, disordered phase to a compact crystalline state. Using a 
new shooting move tailored to polymers combined with a committor analysis, we study the transition state 
ensemble of an = 128 chain and search for possible reaction coordinates based on likelihood maximization. 
We find that typical transition states consist of a crystalline nucleus with one or more chain fragments attached 
to it. Furthermore, we show that the number of particles in the crystalline core is not well suited as a reaction 
coordinate. We then present an improved reaction coordinate, which includes information from the potential 
energy and the overall crystallinity of the polymer. 


I. INTRODUCTION 

In reaction to changes in their environment, polymers 
often go through large-scale conformational changes akin 
to phase transitions, which are of significance to many bi¬ 
ological processes. A simple homopolymer chain, for in¬ 
stance, collapses from an extended coil to a compact glob¬ 
ule in response to changes in the solveniP. This transition 
is continuous and can be viewed as the single chain anal¬ 
ogy of a second-order phase transition. In other cases, 
the conformational change of the polymer occurs dis- 
continuously in a first-order like transition, and distinct 
conformations can coexist with each other. Such coop¬ 
erative behavior, familiar from two-state protein folding, 
has been also observed in a single polymer chain with 
square-well interactiorP. 

The complex phase behavior of the square-well chain 
has been investigated previousl y in a number of differ¬ 
ent studies. Taylor and LipsorP^ carried out analyti¬ 
cal calculations on very small chains, and showed that 
the system’s radius of gyration is a sigmoidal function 
of temperature. Later, Zhou et al.^ performed simu¬ 
lations of longer chains showing that the increase in 
the polymer’s dimensions with temperature is more pro¬ 
nounced f or lon ger chains. More recently, Taylor, Paul 
and Bindeil^ISEl mapped out the entire phase diagram as 
a function of temperature and width of the attractive 
welP. Depending on the well width, one observes one 
of two phase transitions: a second-order coil-to-globule 
for wide wells, and a first-order coil-to-crystal transition. 
The authors also calculated the free energy as a function 
of the potential energy at coexistence. In the latter case 
a free energy barrier separating the two phases imposes 
a pronounced two-state behavior on the systenP. 

In this work, we focus on clarifying the mechanism of 
the coil-to-crystal transition of the polymer. In order to 
perform molecular dynamics simulations and hence ob¬ 
tain a realistic picture of the transition event, we have de¬ 
veloped a continuous approximation to the pure square- 
well chain. Using this model we performed transition 


path sampling (TPS) simulationP^i^ to obtain reactive 
pathways taking the chain from the expanded to the 
folded configuration. To enhance the efficiency of our 
TPS simulation, we devised a new shooting move where 
the topology of the polymer is altered prior to the actual 
shooting. We then calculated committor values for the 
configurations along the transition pathways and used 
this information as a basis for the search for a possible 
reaction coordinate based on the likelihood maximiza¬ 
tion method of Peters and TroulP^. An analysis of the 
transition state ensemble (TSE) of the system indicates 
that the number of crystalline particles in a given con¬ 
figuration is a rather poor measure of the progress of the 
transition. Similarly, the chain’s radius of gyration does 
not convey any information about the folding transition. 
However, a combination of the potential energy with the 
global order parameter Qq improves the quality of the 
reaction coordinate. 

The remainder of this paper is organized as follows. In 
Sec.|^we define the model and summarize its properties. 
Section [nT| covers the numerical methods employed and 
the details of our simulations. Results are presented in 
Sec.|IV[ and a discussion is provided in Sec.|V] Details of 
the new TPS shooting move are given in the Appendix. 


II. POLYMER MODEL 
A. Square-well chain 

The square-well polymer model co nside red previously 
in the work of Taylor, Paul and Bindeil^I^is a fully flexi¬ 
ble chain of N identical monomers. While the distance a 
between neighboring monomers is fixed, non-neighboring 
monomers interact via the square-well pair potentiaP^ 

{ 00 0 < Rij < a, 

-e a < Rij < Act, (1) 

0 Rij > Act. 
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Figure 1. Coil (left) and crystalline (right) state of the poly¬ 
mer chain for particle number N = 128, interaction range 
A = 1.05 and temperature k-sT/e = 0.438. Crystalline and 
coil-like particles are colored in red and yellow, respectively, 
while intermediate particles are colored in blue. The criterion 
for crystallinity used here is defined in Sec.|III[ 


Here, Rij = \Ri — Rj\ is the distance between the f-th 
and j-th monomer and A > 1 parametrizes the width of 
the potential well. Due to the form of the potential, the 
system has a discrete energy spectrum En = —ne, where 
n is the number of square-well overlaps in the chain. 

As recently shown by Taylor, Paul and Bindeil^, de¬ 
pending on the value of the potential width A and tem¬ 
perature T, there exist three different phases. At high 
temperatures, the polymer is in the expanded coil phase 
for all values of A. What happens if the system is cooled, 
however, depends on the width A of the attractive well. 
For wide wells, upon cooling the chain first collapses into 
a compact but disordered globule. This transition is of 
second order. Further lowering the temperature then 
leads to a first-order phase transition to a crystalline 
state. For sufficiently narrow wells (A < 1.05), the system 
directly freezes from the expanded coil to the crystalline 
state in a first-order transition. Snapshots of the coil 
state and the crystal state are depicted in Fig.[T] 
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Figure 2. Original square-well and new smoothed potential 
for A = 1.05. The harmonic spring potential (acting between 
neighboring beads) is also shown for comparison. 
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developed a smooth (differentiable) version of the square- 
well potential. The smooth potential is defined as 

u{R) = I jexp 

( 2 ) 

where we have chosen a value of a = 0.002 cr for the 
parameter which determines the steepness of the expo¬ 
nential repulsion and the width of the step at i? = Act. 
Neighboring monomers are coupled via harmonic springs 
U{R) = |(i? —ct)^ with a value of fc = 20000 for the 
spring constant. A comparison of the original square-well 
potential and its smooth variant is shown in Fig.[^ For 
our simulations, we have chosen the N = 128 chain with 
an interaction length of A = 1.05. 
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III. METHODS 


In this Section we provide a brief outline of the com¬ 
putational methods used to obtain the results discussed 
in Sec.lIVI 


B. Chain with smoothed square well 

Since we are interested in the mechanism of the coil- 
to-crystal transition, the procedure used to evolve the 
system along transition pathways must resemble the nat¬ 
ural dynamics of the system. If Monte Carlo dynamics 
is considered, this implies that only local moves can be 
used. Molecular dynamics provides a more physical (and 
computationally more efficient) way to model the time 
evolution of the system. To facilitate such simulations 
and avoid the cumbersome handling of impulsive forces 
caused by the discontinuities in the potential, we have 


A. Wang-Landau simulation 

In order to verify the equivalence of our smooth chain 
model with the square-well chain studied previouslji^ISIZ!^ 
we have performed Wang-Landau simulation^i^. In the 
Wang-Landau algorithm, one iteratively obtains the den¬ 
sity of states g{E), from which all other thermodynamic 
properties follow. This is done by performing a Monte 
Carlo simulation with the inverse of the current estimate 
of the density of states as the weight of a configuration, 
instead of the usual Boltzmann weight. A single Monte 
Carlo move is one of five distinct possibilities, selected at 
randoirP: 
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1. pivot move: a monomer i within the chain is 
selected randomly and the whole chain segment 
[z +1, iV] undergoes a random Euler rotation about 
the z-th monomer; 

2. end move: the 1st or iV-th monomer is rotated by 
a small angle about a random axis; 

3. crankshaft move: a randomly chosen monomer 
within the chain is rotated by a small angle about 
the axis defined by the line connecting the two 
neighboring particles; 

4. bond-bridging move: described in Sec. |III 5} 

5. standard displacement move: a randomly chosen 
monomer is moved by a small random displace¬ 
ment. 

One pass of the Wang-Landau simulation consists of 2N 
single moves and contains, on average, 2 pivot moves, 
2 end moves, N/2 — A crankshaft moves, N/2 bond¬ 
bridging moves and N displacement moves. For the en¬ 
ergy histogram H{En), we have used bins of unit width 
in an energy range of [—430 e, 400 e], which in order to 
speed up the computations is split into two overlapping 
energy ranges. The results are then averaged and joined 
at Ejoin = — 190e. The quality of the joint is checked 
by computing the numerical derivative of the density of 
states, which is found to agree well in the vicinity of 
Ejoin- The histogram H{En) is checked every 10'^ passes 
for flatness, and is considered flat if no bin deviates by 
more than 20 % from the average. Furthermore, as an al¬ 
ternative flatness criterion, we check for uniform growth 
of H{En) (i.e., each bin in the histogram has increased 
within a range of 20 % of the average growth since the last 
check) every 5 x 10^ passes. To keep the numbers within 
the range that can be handled by standard floating-point 
arithmetic, lng{E) instead of g{E) is stored. Our ini¬ 
tial value of the density of states is g{En) = 1 for all En, 
with an initial modification factor of /o = e^. Subsequent 
modification factors are chosen as fm+i = \fjm- 

Once the density of states is known, one can calculate 
the canonical partition function 

Z{T) = Y,g{E^)e-P^- (3) 

n 

and the probability density of the energy, 

P(F„,r) = ^5(£;„)e-^^’'. (4) 

Here, (3 = I/ZcbT is the inverse temperature. With that, 
any quantity which is a function of the energy can be 
calculated, in particular the specific heat 

C{T) = - {E)% (5) 

where 

{E)=Y,EnPiE^,T). ( 6 ) 


For a hnite system, maxima in C(T) are related to phase 
transitions or other structural rearrangementJ^^. 


B. Transition path sampling with core modification 

In order to gain insight into the transition mechanism, 
we have performed transition path sampling simulations 
for a temperature at which the coil and crystal coexist. 
At this particular temperature, the equilibrium probabil¬ 
ity distribution of the energy is bimodal with two peaks 
of equal weight for the coil and crystallite, respectively. 
In between, the probability is extremely low, which cor¬ 
responds to a high free energy barrier separating the two 
types of configurations. Hence, in an equilibrium molecu¬ 
lar dynamics or Monte Carlo simulation, one would prac¬ 
tically never see a direct transition from the coil to the 
crystallite or vice versa. We overcome this time scale 
problem by using transition path sampling. In contrast 
to Wang-Landau sampling, TPS is able to provide dy¬ 
namical pathways taking the system from one state to 
another. 

Our TPS simulations use aimless shootinand a flex¬ 
ible path length. We employ Langevin dynamics with 
a time step At = 0.0002 sJmaPJe and a damping con¬ 
stant 7 = ^ where the integration of the 

equation of motion is performed using the Langevin ther¬ 
mostat by Schneider and StolP^ implemented in a mod¬ 
ified version of LAMMP^i^. The distinction between 
the states A (expanded coil) and B (frozen crystallite) 
is made based on the potential energy of the system. 
A configuration is considered to be in the coil state if 
U/N > U^\^/N = —0.7e and it is considered to be in 
the crystalline state if U/N < Umax/N = —2.6e. We 
check if one of these states is reached every rzcheck = 2000 
time steps, which is also the time interval for saving the 
current snapshot along the path. The shooting point 
is selected with equal probability from Xq°\ ^o+at 

Xq°2at’ where denotes the shooting point of the pre¬ 
vious trajectory and AT = 50 x zicheck At. No restric¬ 
tions are placed on the length of the pathways. A path 
is accepted if it is reactive, i. e., A —)■ B or B A, and 
rejected if it is not. 

One major problem when performing a standard tran¬ 
sition path sampling simulation of the polymer chain is 
that the decorrelation between subsequent pathways is 
very slow. In particular, subsequent pathways generated 
with the standard shooting move have almost identical 
crystalline cores in the transition state region. A so¬ 
lution of this problem is to modify the shooting point 
prior to each shooting move more than by just modify¬ 
ing the particle velocities. In particular, at each shoot¬ 
ing point w e per form a number of bond-bridging Monte 
Carlo move^^*^ in addition to a random assignment of 
the particle velocities. This bond-bridging moves modify 
also the crystalline core, leading to a faster decorrelation 
of the pathways. 
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In order to maintain the equilibrium distribution of 
states, the bond-bridging move has to be performed at 
the same temperature as the actual transition path sam¬ 
pling simulation. In this move, one first chooses the first 
{i = 1) or last {i = N) monomer of the polymer with 
equal probability. Then, all interior monomers (z > 3 or 
i < N —2) within a distance of 2 cr of the chosen end are 
identihed. One of these monomers is chosen at random 
and then connected to monomer 1 (or N) via removal 
of monomer i — 1 (or z -I- I) and re-insertion between 
monomer i and the selected end at a random azimuthal 
angle. The re-insertion is furthermore done such that 
none of the bonded distances changes. The move is then 
accepted with probability 


-Pacc(a—&) = min ^l,e , (7) 

where the potential energy change A?7 = Ub — Ua 
stems only from the change in the potential for the non¬ 
neighboring sites, ba and bb are the numbers of possi¬ 
ble bridging partners in the initial and hnal state, and 
Ra = Rii (or RiN) is the distance between monomer 
z and the selected end, with Rb defined accordingly for 
the final state. A detailed description of the algorithm 
adapted to smooth bonding potentials is provided in 
Appendix It is worth noting that due to the bond¬ 
bridging moves carried out at the shooting point, the 
newly generated pathway does not intersect with the pre¬ 
vious path at the shooting point. Still, each individual 
path is continuous in both space and momentum. 


C. Particle classification 


D. Overlap between configurations 


In order to investigate the effectiveness of the core¬ 
modification shooting move in generating new trajecto¬ 
ries, we introduce a measure for the overlap between sys¬ 
tem configurations. The key idea is that two configura¬ 
tions are classihed as similar if they share contacts be¬ 
tween the same particles. Here, we use the matrix of pair 
energies as a measure of contact. Due to the shape of the 
pair potential, typically the pair energies in units of e are 
very close to —I (in contact) or 0 (not in contact). For 
two configurations, represented by contact matrices X 
and F, we define the overlap as 


CXY = 


y. XiiYin 


E y2 Y 

i,j \ ^ i 


( 8 ) 


where z and j run over all monomers in the system. Note 
that by taking the element-wise product of the matrices, 
only connections which are present in both configura¬ 
tions give a contribution to the overlap measure. Also, 
the normalization corrects for the total number of con¬ 
tacts present and ensures that the overlap between two 
identical configurations is 1. Taking this idea one step 
further, we define a correlation function, which measures 
the average overlap over a time series of configurations, 
for example the shooting points of a long TPS simulation, 


z{n) = 


{6X{0) ■ SX{n)) 
(5X2) • 


(9) 


Here, the matrix 5X{n) = X{n) — {X) is the deviation 
of the contact matrix at step n from its average value 
and the product is meant in the sense of Eq. (§. By 
definition, c(0) = I. 


In order to distinguish between coil-like and solid parti¬ 
cles, we use the connection c oeffici ents dij based on Stein- 
hardt bond order parameter JiSEll Por two particles z and 
j close to each other (distance smaller than 1.05cr), the 
connection coefficient dij is defined as the scalar product 
of the complex qq vectors of particles z and j expressed in 
terms of spherical harmonicJ^^. The scalar product mea¬ 
sures the correlation between the local environments of 
particles z and j. We then define two adjacent particles 
z and j as connected if dij > 0.5. Furthermore, let us 
denote the number of (bonded and non-bonded) neigh¬ 
bors and the number of connected neighbors of particle z 
with Nn{i) and Nc{i), respectively. Particle z is defined 
as crystalline if > 5 and N^. > N„ — I. This combina¬ 
tion of two conditions ensures that surface particles with 
a reduced number of adjacent particles can be detected 
as crystalline. At the same time, particles deep within 
the core of the polymer are not incorrectly detected as 
crystalline if they belong to the compact, but unordered 
phase. In addition, a particle is defined as coil-like if 
Nn < 4 regardless of the value of Nc- We classify par¬ 
ticles as intermediate if they are neither crystalline nor 
coil-like. 


E. Committor analysis 

For a system with two (meta-) stable states A and B, 
the committor pnij) of configuration r is the fraction 
of dynamical pathways started from r that first reaches 
state i^. In practice, one launches a number of trajec¬ 
tories starting with random momenta from r and counts 
the fraction of trajectories ending in B. Our committor 
calculations were performed according to the algorithm 
described in Refill using = 100 and X„iax = 500. 

F. Reaction coordinate analysis 

To find a reaction coordinate capable of quantifying the 
progress of the transition we follow the likelihood max¬ 
imization approach proposed by Peters and TroulAl. In 
this method, a proposed reaction coordinate r is modeled 
as a linear combination of m physical parameters qk'- 

m 

+ <ao, 


( 10 ) 
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where q = {^fc}. The committor ps is assumed to be a 
sigmoidal function of this model reaction coordinate, 

Pb( 0 = ^[1+ tanh(r)]. (11) 

The coefficients are then chosen such that the likeli¬ 
hood 


B A 

Ha) =nil - ( 12 ) 

k k' 

is maximized. In the above equation, the first product 
runs over all single shooting events ending in state B, 
while the second product runs over all shooting events 
ending in state A. The likelihood L{a) quantifies the 
compatibility of the proposed model with the measured 
committor values. We use the Bayesian information cri- 
teriorPi^ 


BIC = —2 In L{a) -I- (m -I-1) ln(n) 


(13) 


to compare the optimization results for different numbers 
of optimization parameters, where smaller BIC values are 
better. Here, n is the total number of observations, i. e.. 


the total number of shooting events entering in Eq. (121, 


and m is the number of free parameters entering the 
model. The BIC penalizes models with too many free 
parameters, hence it is used to check whether it is sensi¬ 
ble to add additional physical parameters to improve the 
model reaction coordinate. 

To carry out the reaction coordinate analysis, we first 
calculate the committor for a number of states randomly 
selected from transition pathways. We then calculate 
a set of collective variables for each state. In order to 
obtain the optimal combination of collective variables, 
we first determine the single variable that maximizes the 
likelihood defined above. Then, we maximize the likeli¬ 
hood for each 2-variable combination of the selected vari¬ 
able and all the remaining other variables. The procedure 
is repeated successively adding variables. Combinations 
of up to 3 variables are considered. 

For a perfect reaction coordinate r(q) all configura¬ 
tions with the same value o f r s hould have the same 
committor, psiH = Pb [r(q)J22123|_ particular, there 
should exists a value r* of the reaction coordinate such 
that all configurations with r* have a committor of 
Pb = 1/2, i. e., they have equal likelihood to relax into 
state A or B. These configurations are transition states 
and together they form the transition state ensemble 
(TSE). Accordingly, the distribution of committor val¬ 
ues for states with r(q) = r* should be strongly peaked 
around a value of 0.5. For a perfect reaction coordinate, 
the only deviation from ps = 0.5 is due to the statistical 
error that arises in computing the committor from a fi¬ 
nite number of trajectories. Therefore, the width of the 
peak around pb = 0.5 decreases with increasing number 
of trajectories used to calculate the committor for each 
configuration. 



k^T/e 

Figure 3. Specific heat per monomer C{T)/NkB.T for the 
square-well model and its smooth modification. For the 
smooth chain, the freezing peak is located at k-aT/e = 
0.438 ± 0.001, while the value for the square-well chain is 
k^Tje = 0.446 ±0.001, in accordance with the result from 
Taylor etaP. Inset: density of states In g{E). Note that for 
the pure square-well chain, no positive energies are possible 
due to the absence of (finite) positive potentials. Both curves 
are normalized such that In ^(0) = 0. 


IV. RESULTS 


A. Density of states and heat capacity 


In order to relate the freezing transition of the smooth 
polymer chain to that of the pure square-well chain, we 
first need to verify that its phase behavior is similar to 
that of the original model. Hence, we have calculated 
the specific heat C{T) for both the original square-well 
chain as well as for its smooth variant using the density of 
states (Fig.[^ obtained by Wang-Landau sampling. Two 
important observations can be made. First, the overall 
structure of the two curves is almost identical, especially 
in the important region corresponding to the freezing 
transition. For the smooth model, however, the peak 
is shifted slightly to lower temperatures, and the whole 
curve has a higher value away from the peak. This can 
be explained by noting that the harmonic springs in the 
new model introduce additional degrees of freedom not 
present in the original polymer. Therefore, the specific 
heat is increased, and the system needs a lower temper¬ 
ature for the freezing transition to occur. In Fig.[^ we 
have plotted the probability distribution of the energy 
and the free energy profile at the freezing temperature. 
The observed barrier height of roughly 19 k^T agrees well 
with the result from Taylor et al. for the pure square-well 
chairP. 
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Figure 4. Free energy (solid line) and probability distribu¬ 
tion (dashed line) as a function the potential energy at the 
freezing temperature k^T/e = 0.438. The two peaks cor¬ 
respond to the crystalline and coil state, respectively. The 
stable state boundaries used in our TPS simulations are indi¬ 
cated by dashed vertical lines. The configurations shown are 
snapshots of the coil (right) and crystalline (left) states and 
the top of the free energy barrier (center). 



Figure 5. (Top) Two shooting point states with a practi¬ 
cally identical crystalline core and cxy = 0.407. (Bottom) 
two shooting point states from the same TPS simulation with 
lower degree of overlap, cxy ~ 0.072. The program gmmre^^ 
has been used for the alignment procedure. 


B. Efficient sampling of reactive pathways 

Some examples of shooting point configurations ob¬ 
tained from a TPS simulation are shown in Fig.[^ Even 
from visual inspection it is clear that the two configura¬ 
tions with a high overlap value share a practically iden¬ 
tical crystalline core. For better visibility, the configura¬ 
tions have been spatially aligned using the program gmm- 
reg, which implements a point set registration scheme 



Figure 6. Correlation function c(n) as a function of the num¬ 
ber of TPS iterations carried out with (red) and without 
(blue) the use of the core-modification. In the main panel, 
the correlation function is evaluated at the shooting points 
and in the inset at the final states of the transition pathways. 


based on gaussian mixture model^^. In Fig.[^ we have 
plotted the correlation function c(n) as a function of TPS 
cycles as calculated during a TPS run. Correlation func¬ 
tions for the shooting points as well as for the final folded 
states of the reactive pathways are shown. It can be 
clearly seen that the use of the core-modification prior 
to each shooting is vital to de-correlate the state within 
a reasonable amount of time. Otherwise, even after 100 
TPS cycles, there is still a considerable amount of over¬ 
lap between shooting points and even folded states. In 
other words, without core modification the simulation is 
stuck in a single class of similar folding pathways which 
all lead to final states with identical cores. This is no 
surprise since already Taylor, Paul, and Bindei!^ observed 
that for bigger chains also the Wang-Landau simulations 
do not converge without the use of bond-bridging moves. 
Similarly, one must employ bond-bridging moves at the 
shooting point when performing a TPS simulation in or¬ 
der to overcome the barrier in trajectory space. Note 
also that in the calculation of c(n), a normalization fac¬ 
tor {X'^) — {Xy occurs. It is important to realize that 
these two averages are different for shooting points and 
folded states, as there will be more connections present in 
the folded states compared to the shooting points. Fur¬ 
thermore, when just calculating (X^) and {X}'^ from a 
single time series of shooting points, one will make an 
error, as these points along a time series are not properly 
de-correlated. Therefore, we have performed a number 
of independent TPS runs, each started from already de- 
correlated paths, to obtain correct values for {X'^) and 
{X)\ 
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Figure 7. Some configurations from the transition state en¬ 
semble. 



Uje 

Figure 8. Flistogram of the potential energy for configurations 
in the transition state ensemble (red) fitted with a Gaussian 
distribution (blue). Insets: Same for the global order param¬ 
eter Qe (top) and the number of crystalline particles in the 
core Afcore (bottom). 


C. Committor analysis and transition state ensemble 

One immediate result of our committor calculations is 
the transition state ensemble. We define a state to be an 
element of the TSE if its committor differs from 0.5 
by no more than the statistical uncertainty estimated as 

ApB = \/pb(1 -pb)/M. (14) 

Here, M is the number of realized shootings used for the 
estimation of ps. In total, we have harvested 210 transi¬ 
tion states, a few of which are depicted in Fig.[^ Our 
first result from analyzing the TSE is an observation 
made earlier by Taylor, Paul, and Binder^ for the pure 
square-well chain: typical transition states consist of a 
crystalline nucleus with one or more chain fragments at¬ 
tached to it. The result of Taylor et al. was drawn from a 
visual inspection of states which are energetically in be¬ 
tween the high-energy coil and the low-energy crystalline 
phase. Our committor analysis confirms this observation 
with a more rigorous approach. In Fig.[^ we have plotted 



Figure 9. (Red) Committor distribution for states with po¬ 
tential energy U/e = —175.6 ±5.0, corresponding to the max¬ 
imum in Fig.|^ (Blue) Committor distribution for states with 
an value of r = 0.0 ± 0.15 for the optimized reaction coordi¬ 
nate defined in Eq. (101. The total number of states is about 
the same for both histograms. The solid lines are fits with 
gaussian functions. 


the distribution of the potential energy, the global order 
parameter Qq, and the number of crystalline particles in 
the core for states in the TSE. The core is defined as the 
largest cluster of crystalline particles. The energies of 
configurations belonging to the TSE are between —220£ 
and —140 s, roughly corresponding to the barrier region 
in Fig.|^ The broad distribution seen in Fig. [^indicates 
that the potential energy is not well suited for an accu¬ 
rate description of the progress of the transition. This 
observation is confirmed by the committor distribution 
(Fig.|^ for states from the transition path ensemble with 
energy Eje = —175.6 ± 5.0 corresponding to the peak of 
the energy distribution shown in Fig.|^ Instead of being 
sharply peaked around a value of 1/2 as one would ex¬ 
pect for a good reaction coordinate, the distribution is 
broad and includes committor values from 0 to 1. 

As seen in Fig. 10 the committor along a typical (fold¬ 
ing) transition path is not simply a monotonically in¬ 
creasing function of time. Instead, starting at a value of 
0 at the stable state A, it oscillates up and down several 
times before finally reaching 1 at the stable state B at 
the end of the trajectory. This is indicative of the rough 
and diffusive nature of the freezing transition. 

From a two-dimensional free energy landscape as a 
function of the total energy U and the squared radius of 
gyration Taylor et al.l^ identified the dominant folding 
pathway as the minimum free energy path. Our results 
strongly suggest that such a dominant folding pathway 
is only representative in an average sense, especially re¬ 
garding the squared radius of gyration. In Fig. E] we 
have plotted the committor values for the configurations 
from our harvested transition paths in the U-R^ plane. 
While the committor is clearly smaller for higher energies 
and bigger for lower energies, there is almost no structure 
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Figure 10. Time evolution of the committor pb (blue), the 
potential energy U (red), the number of (crystalline) particles 
in the core Ncoie (green, scale not shown), and the radius of 
gyration Rg (black, scale not shown) along a typical transition 
path. 



Uje 


Figure 11. Scatter plot of the committor ps as a function of 
the potential energy U and the squared radius of gyration R^. 
The dots are colored according to the corresponding commit¬ 
tor value according to the color code shown in the bar at the 
right. Note that in the vicinity oil] je = —200, there are red 
dots (pb = 1) next to blue dots (pb = 0). 


at all in the direction. Also, states with committor 
values near 0 and 1, respectively, can be found next to 
each other along this axis. This implies that the combi¬ 
nation of U and B?g is a rather poor choice for predicting 
committor values and describing reaction pathways. As 
an illustrative example, one might think of a case where 
a long coil-like fragment is attached perpendicularly to a 
crystalline core. The value of i?g for such a configuration 
will be rather large. However, after just one pivot move 
about one of the particles within the coil-like fragment, 
i?g might tremendously decrease by rotating a significant 
part of the chain nearer to the core. At the same time, 
it is rather likely that neither the total energy nor the 
committor have changed at all after such a move. 


D. 


likelihood 1 likelihood 2 likelihood 3 


u 

-501687 

-501687 

-490654 

^core 

-523682 

-495948 

-490654 

Qe 

-631293 

-490654 

-490654 

7 

-671208 

-501275 

-490528 

Qr" 

-682370 

-500601 

-490654 

Rg 

-713280 

-498237 

-482159 

h 

-735140 

-500141 

-490654 

h 

-735190 

-497896 

-482397 

h 

-752408 

-498003 

-487678 

gf" 

-787806 

-500086 

-490654 

a 

-798601 

-496548 

-482170 

Qi 

-808050 

-495809 

-490649 

QT" 

-818773 

-499298 

-490580 

Qr^ 

-826644 

-496584 

-490554 

BIC 

1003386 

981326 

964342 


Table I. likelihood 1: log likelihood scores for the different sin¬ 
gle collective variables, likelihood 2: log likelihood scores for 
the combination of U with the given variable, where the opti¬ 
mization was performed over the coefficients for both U and 
the other variable, likelihood 3: the same for the combination 
of {U, Qe) with the given variable. The maximum for each 
combination is highlighted in bold type. Also the Bayesian in¬ 
formation criterion for each combination with maximum like¬ 
lihood is given. Note that smaller BIC values are better. 


The main objective of the search for a reaction coor¬ 
dinate is to find a - preferably simple and transparent - 
function of the system variables that encodes the progress 
of the transition and reduces many correlated degrees of 
freedom to a single, essential one. However, it is clear 
from Figs.[8| and [TOl that neither the total potential en¬ 
ergy U nor the number of particles in the crystalline 
core Acore alone is able to serve as a reaction coordi¬ 
nate. Nonetheless, one might hope that a combination of 
these two quantities, possibly including some additional 
physical parameters, could lead to better results. 

We have tested the quality of several collective vari¬ 
ables as a reaction coordinate by computing the like¬ 
lihood L(a) from Eq. (12| for optimized coefficients a. 
It turns out that when one uses only one variable as a 
model reaction coordinate, U gives the best likelihood 
(see Tab.|l]). Furthermore, when including an additional 
variable and performing the optimization over both U 
and the second variable, the use of the global order pa¬ 
rameter Qe gives the best improvement. The improve¬ 
ment is better than with the inclusion of Ncore-, even 
though, for single variables, A<.ore has a higher likelihood 
score. However, since the energy is dominated by the 
number of non-permanent bonds, and most bonds occur 
in the crystalline core, it is clear that U and Acore are 
highly correlated. Therefore, A^ore can add only very 
little information that is not already present in U, and 
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consequently, the variable pair (?7, A^core) gets a lower 
likelihood score than ([/, Qq). Iterating the procedure 
one step further, we find that including the radius of gy¬ 
ration Rg improves the likelihood the most. However, 
the relative improvement for a third variable is rather 
small, confirming our previous observation that the ra¬ 
dius of gyration does not carry much information about 
the crystallization mechanism. Furthermore, both the 
anisotropy a = / 3//1 — 1 of the configuration, with the 
moments of inertia = /max and R = /min, as well as /a 
alone, give a very similar likelihood score when used as 
a third variable in the optimization. In order to test the 
robustness of the procedure, we have performed the opti¬ 
mization procedure for smaller subsets of the full data set 
of known committor values. While U and Qe are ranked 
first and second consistently, the third variable is often 
different for each run. For example, in one case R-, R, 
and a were equally well suited as a third variable, giv¬ 
ing almost identical likelihood scores. This implies that a 
third variable in addition to the combination (U,Qq) does 
not improve the quality of the optimized reaction coor¬ 
dinate in a significant way. The log likelihood scores for 
the full dataset are listed in TablelH Other collective vari¬ 
ables considered in the optimization were the order pa¬ 
rameters Q 4 and Qe evaluated for core 
peripheral particles (Q 4 “',( 56 '*”) only, and the biggest 
eigenvalue 7 of the contact matrix (as used by Ruzicka 
and coworkerd^. The contact or Laplacian matrix is 
defined as 

{ —I \i — j| > I and Rij < Act, 

0 \i — j\> 1 and Rij > Act, 

0 = 

~ ^kj — j\ = 0- 

(15) 

In our work, we have actually used the matrix of pair 
energies instead, which is very similar to the above defi¬ 
nition due to the shape of the pair potential. Note that as 
observed by Ruzicka and coworkers previously, it makes 
little difference for the behavior of 7 whether one uses 
the above definition for the diagonal elements of the ma¬ 
trix, or sets them to zero as it is the case when using the 
pair energies. The committor as a function of the opti¬ 
mal reaction coordinate constructed from U, Qe and Rg 
is shown in Fig.[^ Although this reaction coordinate 
yields the best likelihood score, there remains a consid¬ 
erable spread in the committor for any particular value 
of the reaction coordinate. 

In Fig.|^ we have plotted the committor distribution 
for all states with a value of r = 0.0 ± 0.15 for the op¬ 
timized reaction coordinate. For both the potential en¬ 
ergy and the optimized RC, the distribution has a small 
trough around pb — 1/2. Since all states were selected 
at random, a likely explanation is that the system simply 
does not spend a long time at committor values around 
1/2. If the proposed reaction coordinate is of poor qual¬ 
ity, the committor distribution for the transition value 
of the RC will look similar to the unrestricted distri- 



Figure 12. Committor pb as a function of the optimum re¬ 
action coordinate r, which is a linear combination of the po¬ 
tential energy U and the global order parameter Qg. The 
red line is the model reaction coordinate of Eq. (111. Inset: 
Committor as a function of U alone. 


bution, which also has a trough around 1/2. However, 
compared to the distribution for potential energy values 
of U/e = —175.6 ± 5.0, for the optimized RC one sees a 
slight improvement because the width of the histogram 
has decreased due to the optimization, while the peak of 
the distribution is shifted towards a value of 0.5. In par¬ 
ticular, Gaussians fitted to the histograms have a mean 
of 0.64 with a standard deviation of 0.29 for the poten¬ 
tial energy as reaction coordinate and a mean of 0.54 and 
a standard deviation of 0.25 for the optimized reaction 
coordinate. 


V. DISCUSSION 

For sufficiently small interaction ranges A, the square- 
well polymer chain shows a two-state folding transition 
from the extended coil directly to the crystalline state. 
Using TPS simulations with a new core modification 
move, which has proved crucial for the ergodic sampling 
of transition pathways, we have studied the folding mech¬ 
anism in detail. A committor analysis of the harvested 
transition pathways has confirmed the earlier observation 
that transition states have a single crystalline nucleus, 
while the rest of the system is still in a coil-like configu¬ 
ration. While the fully crystalline state and the coil state 
can be distinguished based on the total potential energy, 
which essentially counts the number of close contacts be¬ 
tween monomers, the energy is not well suited as reaction 
coordinate. The potential energy of transition states is 
distributed around U/s = —176, which also coincides 
with the maximum of the free energy curve at the tran¬ 
sition temperature. However, the broad distribution of 
committor values at U/e = —176 shows that the poten¬ 
tial energy does not accurately quantify the progress of 
the crystallization transition. Using the likelihood max- 
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imization method of Peters and Troutlii^, we have con¬ 
structed an optimized reaction coordinate, which is a 
linear combination of the the potential energy and the 
global order parameter Qe of the polymer. This opti¬ 
mized reaction coordinate captures the progress of the 
folding transition more accurately than the energy alone 
or any other combination of two variables we tested, in¬ 
dicating that variables describing the structure and the 
overall shape of the polymer are needed in order to under¬ 
stand the transition mechanism. However, the improve¬ 
ment of the reaction coordinate obtained by including Qe 
is marginal. 

The question arises whether the optimum reaction co¬ 
ordinate identified for the specific model studied in this 
paper is transferable to other polymer models. In our 
model with short-range attractions and strongly repul¬ 
sive cores, the potential energy works as order parameter 
because it is proportional to the number of contacts. In 
the case of polymers with additional contributions to the 
potential energy, such as torsional and angular potentials, 
it will be more appropriate to use the number of contacts 
directly rather than the potential energy. Similarly, the 
order parameter Qg is sensitive to a close-packed struc¬ 
ture, which occurs in the crystalline state of the square- 
well chain. For more complex polymers, it is likely that 
one will get better results when using an order parameter 
which is sensitive to the particular ground state structure 
of the system under consideration. 

An interesting aspect of of the transition has been 
noted recently by Ruzicka, Quigley and AlleiP^ in for¬ 
ward flux sampling simulations of a slightly modified ver¬ 
sion of the chain allowing for the application of collision 
dynamics. They have analyzed the distribution of the 
largest eigenvalue 7 of the polymer’s Laplacian matrix. 
This variable exhibits a different distribution depend¬ 
ing on the folding probability. More specifically, folding 
pathways with a high folding probability, as well as un¬ 
folding pathways, show a bimodal distribution of 7 , while 
the equilibrium distribution for the same temperature is 
unimodal. The second peak occurs at a higher value of 7 , 
which also corresponds to configurations with an already 
ordered crystalline nucleus. 

While 7 carries a signature characteristic for the two 
stable states, it does not convey much useful informa¬ 
tion about the progress of the transition. The features in 
the distribution of this variabl^^ have a straightforward 
explanation: Configurations which have a finite folding 
probability already need to have some degree of crys¬ 
tallinity, otherwise they will not fold even if their energy 
is rather low. In other words, while 7 carries enough in¬ 
formation to decide whether there is some or zero folding 
probability, the variable cannot be used to make an accu¬ 
rate prediction of the actual committor value if it is any¬ 
thing other than zero, and, therefore, does not perform 
well as a reaction coordinate. We have confirmed this re¬ 
sult even when we include several or even all eigenvalues 
of the contact matrix in the reaction coordinate approxi¬ 
mation. A possible explanation is that even the full set of 


eigenvalues suffers from the same flaw as 7 alone: Similar 
to other global order parameters, they are only a mea¬ 
sure of the overall crystallinity present in the system. By 
construction, they are completely symmetric under re¬ 
ordering of the particles. However, in a polymer chain, 
which has linked neighbors, the order of the monomers 
actually does matter, a fact that is completely neglected 
when using such measures of crystallinity. Therefore, it 
remains a challenging task to construct better order pa¬ 
rameters for homopolymers, which take into account the 
actual order along the chain, while still being symmet¬ 
ric under other operations, such as reversing the label¬ 
ing without changing the connectivity. More elaborate 
machine learning approaches such as support vector ma¬ 
chines may be helpful in this endeavor. 
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Appendix A: Bond-bridging move with variable bond 
lengths 


The bond-bridging move used in our TPS simulation 
works as follows. First, end particle I or V is chosen with 
equal probability. Then, all interior monomers (z > 3 or 
i < N — 2) within a distance of 2 L of the chosen end 
are identified, and one of these, denoted by i, is chosen 
at random. This monomer is reconnected to the end via 
removal and re-insertion of the next chain neighbor in 
the direction of the end, namely particle z—Iorz-|-l. A 
schematic view of this procedure is given in Fig.[^ The 
acceptance probability for the move is 


Cacc(a b) 


min 


1 , 


p{b) bg Rb 
p{a) bb Ra 


(Al) 


Here, ba {bb) stands for the number of possible bridging 
partners present in state a (b) and Ra is the distance of 
monomer i to the previously selected chain end, with Rb 
defined accordingly, p is the equilibrium distribution to 
be sampled. As usual, the value of p{b)/p{a) depends 
on the type of simulation that is performed: We have 
p{b)/p{a) = for a canonical ensemble at in¬ 

verse temperature /3 and p{b)/p{a) = g{Ea)/g{Eb) in the 
case of a Wang-Landau simulation. 

As we will discuss in the following, particular care has 
to be taken in deriving the acceptance probability, be¬ 
cause in our modified square-well polymer, bond lengths 





11 



Figure 13. Schematic 
move. 




are allowed to fluctuate. The move is designed such that 
none of these distances are changed, in other words, the 
change in the potential energy from the harmonic springs 
is always zero. 

In all the following, we will assume that particle 1 has 
been selected as the end to be reconnected. The situation 
for end particle N is of course identical, but i — 1 has to 
be replaced by i + 1 and i — 2 hy i + 2 etc. Let us also 
denote the distances to keep fixed with i?+ = \Ri — Ri-i\ 
and R- = — Ri- 2 \- The chain is then re-connected 

as indicated in Fig. 14 We have i?a = d+ + d-, and can 
calculate 


d_ 


p2 p2 I p2 

■ 2i4 ^ 


(A2) 


Note that due to the variable bond lengths, it is possible 
that i?+ -I- i?_ < Ra- In that case, there is no possibility 
to perform the re-connection, and the move is rejected. 
Detailed balance for the move can be satisfied with the 
usual Metropolis acceptance criterion 


-Pace (a b) = min 


1 , 


p{b) Pgenjb -> g)' 
p{a) Pge„(a -t 6)J ’ 


(A3) 


where p{x) is the equilibrium distribution to be sampled 
and Pgen (a —> b) is the probability to generate configura¬ 
tion b out of a. Hence, we need to know the generation 
probability in order to get the correct acceptance rule for 
the bond-bridging move. The generation probability can 
be written as a product of three factors: 


Pgen(^^ ^ 


1 1 1 
2 ba n{b) 


(A4) 


Here, 1/2 arises from the fact that there are two ends 
to choose from and ba is the number of possible bridging 
partners present in state a. Furthermore, n(b) is the num¬ 
ber of possible configurations for b once the other choices 
have already been made. This number is proportional to 
the configuration space volume available for the choice of 
b under the imposed constraints: n(b) = cAv{b). In or¬ 
der to estimate Av{b) for the given geometry, we have to 
realize that a constraint on a distance is constructed us¬ 
ing a delta function in the distribution function. In other 
words, 6 {R — R') actually means that R' < R < R' + AR, 
where Ai? is infinitesimal. In the case of our move, the 


Figure 14. Distances and angles involved in the bond-bridging 
move. Here, particle 1 has been chosen as the end to be 
reconnected, therefore particle i — 1 is re-inserted between 
particle 1 and particle i. 



Figure 15. Close-up of the geometry at the particle insertion 
point. 


distances to be kept fixed are i?+ and i?_. In Fig. 15 we 
have illustrated the geometry at the re-insertion point, 
with the parallel lines indicating the infinitesimal con¬ 
straints. The available configuration space volume is 
proportional to the (also infinitesimal) cross section AA. 
Since the azimuthal angle of the re-insertion is random, 
this has to be multiplied with the circumference of the 
circle defined by the rotation of the re-insertion point 
around the axis from 1 to f: 


Av = 27rsAA. (A-5) 

The actual value for AA can be calculated by looking 
at the geometry at the re-insertion point (Fig.[^. We 
have 


^ , AR+AR^ 

AA = -±, 

sin a 


(A 6 ) 


Also, a = TT — /3, therefore sin a = sin/I. To get an 
expression that only includes known distances, we first 
split the triangle defined by the positions of particles f, 
i — 1 and 1 into two right triangles: j5 = 7 + -I- 7 _. Hence, 


sin/3 = sin( 7 + -|- 7 _) 

= sin 7 + cos 7 _ -I- sin 7 _ cos 7 +. (A7) 


The sines and cosines can now be expressed as ratios of 
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known lengths: 


d+ 


d- 

sin7+ = —, 

R+ 

sin 7 _ 

~ rI 

s 


s 

COS7+ = —, 

R+ 

cos 7 _ 

~ 

Putting all together, we get 




^ 27 rAi?+Ai?_i?+i?_ 

Av = -. 

Ra 

Hence the generation probability is 

genio ^ 4TrAR+AR_R+R_ ' 

This implies 


(A8) 


(A9) 


(AlO) 


-Pace (a b) = min 


P{b) bg 

’ p{a) bb 


c47rAi?+Ai?_i?_|.i?_ Rb 

Ra cAttAR+AR-R+R- 


p{b) bg Rb 

’ p{a) bb Ra 


(All) 


Note that this result holds regardless of whether i?+ = 
i?_ or not. In other words, we can use the same accep¬ 
tance criterion as for the case of fixed bond lengths. 
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